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A  Brain-Machine-Brain  Interface  for  Rewiring  of  Cortical  Circuitry  after 

Traumatic  Brain  Injury 

Principal  Investigator:  Pedram  Mohseni,  Ph.D. 

Department  of  Electrical  Engineering  and  Computer  Science,  Case  Western  Reserve  University 

Co-Principal  Investigator:  Randolph  J.  Nudo,  Ph.D. 

Department  of  Molecular  and  Integrative  Physiology,  Kansas  University  Medical  Center 

Introduction 

The  goal  of  this  project  is  to  use  an  implantable  brain-machine-brain  interface  to  enhance  behavioral 
recovery  after  traumatic  brain  injury  (TBI)  by  reshaping  long-range  intracortical  connectivity  patterns.  We 
hypothesize  that  artificial  synchronous  activation  of  distant  cortical  locations  will  encourage  spontaneously 
sprouting  axons  to  migrate  toward  and  terminate  in  the  coupled  region,  and  that  such  directed  sprouting  can  aid 
in  functional  recovery. 

Body 

In  this  section  of  the  annual  report,  we  describe  the  research  accomplishments  associated  with  each  task 
outlined  in  the  approved  Statement  of  Work. 

L  Electronic s  Development 

For  Tasks  1.1  and  1.2,  we  decided  to  use  the  same  integrated  circuit  (IC)  previously  developed  for 
rodent  studies  in  constructing  the  microsystem  for  non-human  primate  studies.  This  is  because  the  capabilities 
of  the  rat  IC  (e.g.,  spike-stimulus  time  delay  range,  stimulus  current  parameters,  etc)  are  deemed  to  be  suitable 
for  the  initial  round  of  experiments  with  non-human  primates.  Further,  we  already  have  >10  functional  ICs  from 
the  original  round  of  IC  fabrication,  obviating  a  need  for  re-fabricating  and  re-characterizing  the  IC  for  non¬ 
human  primate  studies. 

2,  Microsystem.  Packaging 

For  Tasks  2.1  and  2.2,  NeuroNexus  Technologies  (Ann  Arbor,  MI)  was  identified  as  a  reliable 
commercial  supplier  for  silicon-microfabricated  microprobes  for  recording  and  stimulation.  Further,  Flexible 
Circuit  Technologies  (Plymouth,  MN)  was  identified  as  a  reliable  commercial  supplier  of  miniature,  rigid-flex 
substrates.  We  have  also  previously  worked  with  ProtoConnect  (Ann  Arbor,  MI)  for  die  attachment, 
encapsulation,  wire  bonding,  and  assembly  of  all  the  components  onto  the  substrate.  Efforts  are  now  focused  on 
modifying  the  microsystem  assembly  and  packaging  for  ambulatory  experiments  with  non-human  primates. 
Specifically,  the  goal  is  to  fit  the  revised  microsystem  inside  a  custom-designed  plastic  chamber  with  internal 
dimensions  of  18  mm  x  18  mm  that  will  be  affixed  to  the  skull  of  a  squirrel  monkey.  We  have  also  decided  to 
move  the  battery  and  the  wireless  transceiver  module  to  a  backpack  device  (mounted  on  the  back  of  the 
monkey)  in  order  to  further  simplify  the  design  of  the  microsystem  inside  the  skull-mounted  chamber.  The 
microsystem  is  envisioned  to  connect  to  two  multi-site,  chronically  implanted  recording  and  stimulating 
microelectrodes  (NeuroNexus  Technologies,  Ann  Arbor,  MI)  via  two  microconnectors  (Omnetics  Corp., 
Minneapolis,  MN)  in  plug-and-play  fashion.  Acrylic  will  be  used  as  a  biocompatible  encapsulant,  whenever 
necessary.  As  stated  in  the  annual  report  of  the  Partnering  PI,  Prof.  Randy  Nudo,  we  have  already  completed  the 
design  and  fabrication  of  the  plastic  chambers  customized  to  fit  the  shape  of  the  monkey  skull.  This  was  a 
collaborative  effort  between  the  engineering  group  at  CWRU  and  the  neurobiological  team  at  KUMC. 

In  this  section  of  the  annual  report,  we  describe  the  research  accomplishments  associated  with  tasks  from 
previous  phases  as  outlined  in  the  approved  Statement  of  Work. 

Phase  I  (1-12  months).  Task  1  (Electronics  Development) 


4 


1.3  Design  a  neural  signal  processor  for  real-time  stimulus  artifact  rejection  using  template  subtraction 
technique  with  power  consumption  <5  pW. 

An  infinite  impulse  response  (DR)  temporal  filtering  technique  for  real-time  stimulus  artifact  rejection 
(SAR)  based  on  template  subtraction  was  developed.  A  system  architecture  for  the  HR  SAR  algorithm  was  also 
developed,  and  the  operation  of  the  algorithm  with  fixed-point  computation  was  analyzed  to  obtain  the  number 
of  bits  for  the  internal  nodes  of  the  system,  considering  dynamic  range  and  fraction  length  requirements  for 
optimum  performance.  Further,  memory  initialization  with  the  first  recorded  stimulus  artifact  was  implemented 
to  significantly  decrease  the  HR  system  response  time,  especially  when  artifacts  were  highly  reproducible  in 
consecutive  stimulation  cycles.  The  proposed  system  architecture  was  hardware-implemented  on  a  field- 
programmable  gate  array  (FPGA)  and  tested  using  two  sets  of  prerecorded  neural  data  from  a  rat  and  an  Aplysia 
californica  (a  marine  sea  slug)  obtained  from  two  different  laboratories.  The  measured  results  from  the  FPGA 
verified  that  the  system  can  indeed  remove  the  stimulus  artifacts  from  the  contaminated  neural  data  in  real  time 
and  recover  the  neural  action  potentials  that  occur  on  the  tail  end  of  the  artifact  (as  close  as  within  0.5  ms  after 
the  artifact  spike).  The  root-mean- square  (rms)  value  of  the  pre-processed  stimulus  artifact  was  reduced  on 
average  by  a  factor  of  17  (. Aplysia  californica )  and  5.3  (rat)  post-processing.  Details  of  the  HR  SAR  algorithm, 
its  FPGA  implementation  and  testing  with  prerecorded  neural  datasets  are  reported  in  a  manuscript  currently  in 
press  with  the  IEEE  Transactions  on  Biomedical  Circuits  and  Systems  (see  Appendix  I). 

Key  Research  Accomplishments 

•  Develop  a  neural  signal-processing  algorithm  for  real-time  stimulus  artifact  rejection 

•  Implement  the  algorithm  in  hardware  on  an  FPGA  for  real-time  operation 

•  Prepare  and  submit  a  manuscript  to  IEEE  Trans.  Biomedical  Circuits  and  Systems.  The  paper  is  accepted  and 
currently  in  press. 

Reportable  Outcomes 

1-  Manuscripts/Abstracts/Presentations: 

•  D.  J.  Guggenmos,  M.  Azin,  S.  Barbay,  J.  D.  Mahnken,  C.  Dunham,  P.  Mohseni,  and  R.  J.  Nudo,  “Restoration  of 
function  after  brain  damage  using  a  neural  prosthesis,”  Proc.  Natl.  Acad.  Sci.  USA  (PNAS),  in  press. 

•  K.  Limnuson,  H.  Lu,  H.  J.  Chiel,  and  P.  Mohseni,  “Real-time  stimulus  artifact  rejection  via  template  subtraction,” 
IEEE  Trans.  Biomed.  Circuits  and  Systems ,  in  press. 

•  D.  J.  Guggenmos,  C.  Dunham,  M.  Azin,  S.  Barbay,  J.  D.  Mahnken,  P.  Mohseni,  and  R.  J.  Nudo, 
“Neurophysiological  effects  of  activity-dependent  stimulation  following  a  controlled  cortical  impact  to  primary  motor 
cortex  of  the  rat,”  Program  No.  79.12 ,  2013  Neuroscience  Meeting  Planner ,  San  Diego,  CA,  Society  for 
Neuroscience,  November  2013.  Online. 

•  D.  J.  Guggenmos,  M.  Azin,  S.  Barbay,  P.  Mohseni,  and  R.  J.  Nudo,  “Activity-dependent  stimulation  drives 
functional  recovery  after  traumatic  brain  injury  in  the  rat,”  Program  No.  682.16 ,  2012  Neuroscience  Meeting 
Planner ,  New  Orleans,  LA,  Society  for  Neuroscience,  October  2012.  Online. 

2-  Patents  and  Licenses  Applied  for/issued:  None  issued  yet. 

3-  Degrees  Obtained  from  Award:  None  yet. 

4-  Development  of  Cell  Lines  and  Tissue/Serum  Repositories:  Not  applicable. 

5-  Infomatics  (Databases  and  Animal  Models):  None  yet. 

6-  Funding  Applied  for:  None  yet. 

7-  Employment/Research  Opportunities  Applied  for/Received:  None  yet. 

Conclusion 

Rapid  progress  is  being  made  toward  developing  smart  prosthetic  platforms  for  altering  plasticity  in  the 
injured  brain,  leading  to  future  therapeutic  interventions  for  TBI  that  are  guided  by  the  underlying  mechanisms 
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for  long-range  functional  and  structural  plasticity  in  the  cerebral  cortex.  An  unprecedented,  potent  effect  of 
activity-dependent  stimulation  (ADS)  on  motor  performance  has  been  demonstrated  in  rats  with  TBI.  Statistical 
analysis  of  the  data  is  complete  and  includes  both  un-implanted  and  open-loop  stimulation  control  groups.  Post- 
hoc  physiological  data  demonstrate  rapid  establishment  of  functional  connectivity  between  the  two  areas. 
Efforts  are  currently  focused  on  developing  a  revised  microsystem  that  would  enable  the  investigation  of  the 
safety  and  efficacy  of  this  approach  in  a  non-human  primate  model  of  TBI.  In  parallel,  we  have  also  established 
the  feasibility  of  hardware  implementation  of  a  neural  signal-processing  algorithm  for  real-time  elimination  of 
stimulus  artifacts  that  can  potentially  increase  the  amount  of  conditioning  performed  by  the  microsystem 
between  the  two  cortical  regions. 
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Appendix  I  K.  Limnuson,  H.  Lu,  H.  J.  Chiel,  and  P.  Mohseni,  “Real-time  stimulus  artifact  rejection  via 
template  subtraction,”  IEEE  Trans.  Biomed.  Circuits  and  Systems,  in  press. 
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This  article  has  been  accepted  for  inclusion  in  a  future  issue  of  this  journal.  Content  is  final  as  presented,  with  the  exception  of  pagination. 
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Real-Time  Stimulus  Artifact  Rejection  Via 
Template  Subtraction 

Kanokwan  Limnuson,  Student  Member,  IEEE ,  Hui  Lu,  Hillel  J.  Chiel,  and  Pedram  Mohseni,  Senior  Member,  IEEE 


Abstract — This  paper  presents  an  infinite  impulse  response  (HR) 
temporal  filtering  technique  for  real-time  stimulus  artifact  rejec¬ 
tion  (SAR)  based  on  template  subtraction.  A  system  architecture 
for  the  IIR  SAR  algorithm  is  developed,  and  the  operation  of  the 
algorithm  with  fixed-point  computation  is  analyzed  to  obtain  the 
number  of  bits  for  the  internal  nodes  of  the  system,  considering  dy¬ 
namic  range  and  fraction  length  requirements  for  optimum  perfor¬ 
mance.  Further,  memory  initialization  with  the  first  recorded  stim¬ 
ulus  artifact  is  proposed  and  shown  to  significantly  decrease  the 
IIR  system  response  time,  especially  when  artifacts  are  highly  re¬ 
producible  in  consecutive  stimulation  cycles.  The  proposed  system 
architecture  is  hardware-implemented  on  a  field-programmable 
gate  array  (FPGA)  and  tested  using  two  sets  of  prerecorded  neural 
data  from  a  rat  and  an  Aplysia  californica  (a  marine  sea  slug)  ob¬ 
tained  from  two  different  laboratories.  The  measured  results  from 
the  FPGA  verify  that  the  system  can  indeed  remove  the  stimulus 
artifacts  from  the  contaminated  neural  data  in  real  time  and  re¬ 
cover  the  neural  action  potentials  that  occur  on  the  tail  end  of  the 
artifact  (as  close  as  within  0.5  ms  after  the  artifact  spike).  The 
root-mean-square  (rms)  value  of  the  pre-processed  stimulus  arti¬ 
fact  is  reduced  on  average  by  a  factor  of  17  (Aplysia  californica) 
and  5.3  (rat)  post-processing. 

Index  Terms — Closed-loop  neuroprostheses,  field-pro¬ 
grammable  gate  array  (FPGA),  neural  recording,  neurostim¬ 
ulation,  stimulus  artifact  rejection,  template  subtraction. 


I.  Introduction 

STIMULUS  ARTIFACT  REJECTION  (SAR)  is  important 
in  biopotential  recording,  whenever  stimulation  is  per¬ 
formed  in  the  same  medium  in  which  the  recording  electrodes 
are  also  placed  [1].  This  is  because  the  large  stimulus  arti¬ 
facts  can  corrupt  or  mask  the  neural  activity  of  interest,  either 
hindering  the  analysis  of  stimulus-evoked  recorded  data  [1], 
or  limiting  the  efficacy  of  activity-dependent  stimulation  for 

Manuscript  received  February  24,  2013;  revised  May  31,  2013;  accepted  July 
14,  2013.  This  work  was  supported  by  the  Department  of  Defense  Traumatic 
Brain  Injury — Investigator-Initiated  Research  Award  Program  under  Award 
W81XWH-10-1-0741  (to  P.  Mohseni)  and  National  Institutes  of  Health  Grant 
NS047073  (to  H.  J.  Chiel).  This  paper  was  recommended  by  Associate  Editor 
E.  M.  Drakakis. 

K.  Limnuson  is  with  the  Electrical  Engineering  and  Computer  Science  De¬ 
partment,  Case  Western  Reserve  University,  Cleveland,  OH  44106  USA. 

H.  Lu  and  H.  J.  Chiel  are  with  the  Department  of  Biology,  Case  Western 
Reserve  University,  Cleveland,  OH  44106  USA  (e-mail:  hjc@case.edu). 

P.  Mohseni  is  with  the  Electrical  Engineering  and  Computer  Science  Depart¬ 
ment,  Case  Western  Reserve  University,  Cleveland,  OH  44106  USA,  and  also 
with  the  Advanced  Platform  Technology  (APT)  Center — A  Veterans  Affairs 
(VA)  Research  Center  of  Excellence,  Cleveland,  OH  44106-1702  USA  (e-mail: 
pedram.mohseni@case.edu). 

Color  versions  of  one  or  more  of  the  figures  in  this  paper  are  available  online 
at  http://ieeexplore.ieee.org. 

Digital  Object  Identifier  10.1109/TBCAS.2013.2274574 


closed-loop  operation  [2],  [3].  Many  SAR  techniques  have  been 
developed  in  the  past  that  use  the  same  fundamental  principles 
for  rejection,  and  the  choice  of  a  particular  method  is  typically 
dependent  on  the  type  of  biopotential  that  is  being  recorded 
and  the  conditions  under  which  the  recording  is  taking  place 
[4]— [7] . 

The  two  primary  classes  of  SAR  techniques  are  the  so-called 
blanking  and  subtraction  techniques.  There  are  also  some  other 
techniques  that  do  not  readily  fit  into  one  of  these  two  cate¬ 
gories  [8],  [9].  Blanking  techniques  essentially  disconnect  the 
input  of  the  recording  amplifier  during  stimulation.  Stimula¬ 
tion-synchronized  blanking  can  be  achieved  by  several  methods, 
including  grounding  the  amplifier  input  [10],  [11],  connecting 
the  amplifier  input  to  its  output  or  to  that  of  a  sample-and-hold 
circuit  [12],  [13],  digitally  replacing  the  contaminated  signal 
during  the  artifact  interval  with  an  estimate  of  the  uncontami¬ 
nated  signal  [14],  and  using  high-speed  auto-zeroing  to  maintain 
the  amplifier  output  constant  during  stimulation  [15].  In  general, 
blanking  techniques  are  relatively  simple,  effective  for  rejecting 
large  stimulus  artifacts,  practical  for  preventing  amplifier  satu¬ 
ration,  and  inherently  amenable  to  hardware  implementation  for 
real-time  SAR.  The  major  drawback  is  that  recording  is  not  vi¬ 
able  during  stimulation. 

Subtraction  techniques  basically  subtract  a  template  signal 
representative  of  the  stimulus  artifacts  from  the  contaminated 
neural  data  to  remove  the  artifacts.  These  techniques  do  not 
prevent  amplifier  saturation  on  their  own  and  often  necessitate 
running  a  digital  signal  processing  (DSP)  algorithm,  rendering 
them  much  more  complex  than  the  blanking  techniques.  The 
major  advantage  is  that  these  techniques  make  it  possible  to  re¬ 
tain  signal  information  during  stimulation. 

Generating  an  accurate  template  signal  has  been  the  main 
focus  of  research  in  subtraction-based  SAR  techniques  and  can 
be  achieved  by  several  methods,  including  artifact  modeling 
based  on  locally  fitted  cubic  polynomials  [5],  capturing  the  ar¬ 
tifact  from  subthreshold  stimulation  or  from  a  second  recording 
site  remote  from  the  stimulation  site  [1],  and  temporal  averaging 
of  the  contaminated  data  for  multiple  consecutive  stimulation 
cycles  [16],  [17],  with  the  underlying  assumption  that  the  overall 
shape,  dynamic  range,  and  timing  (e.g.,  latency  with  respect  to 
the  stimulus  timing  signal)  of  the  stimulus  artifacts  do  not  sig¬ 
nificantly  vary  with  time. 

Subtraction  techniques  have  the  potential  to  fully  eliminate 
the  artifacts  from  the  contaminated  data  record,  but  have  to  rely 
on  the  generation  of  an  accurate  template  signal  for  subtraction, 
which  in  turn  necessitates  an  adjustment  in  the  recording  ampli¬ 
fier  gain  or  stimulus  intensity  to  enable  non-saturated  recording 
of  the  full-scale  stimulus  artifact.  On  the  other  hand,  providing 
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a  low-impedance  discharge  path  for  the  stimulation  electrode 
using  active  feedback  circuitry  [18],  [19],  as  well  as  careful  de¬ 
sign  of  the  stimulator  in  terms  of  isolation  of  stimulation  chan¬ 
nels  and  parasitic  current  injection  [20]  have  been  previously 
shown  to  decrease  the  duration  and  amplitude  of  otherwise-satu¬ 
rating  stimulus  artifacts.  But  these  approaches  cannot  fully  elim¬ 
inate  the  artifacts  on  their  own,  suggesting  that  an  optimal  solu¬ 
tion  might  be  to  combine  them  with  the  subtraction  techniques. 

Since  subtraction  techniques  typically  require  a  DSP  algo¬ 
rithm  for  the  generation  of  the  template  signal,  they  have  tra¬ 
ditionally  been  implemented  offline  on  a  home-base  computer 
post-data  acquisition.  To  execute  a  subtraction-based  S  AR  algo¬ 
rithm  in  real  time  (i.e.,  as  the  recording  is  taking  place),  a  suit¬ 
able  template-generation  technique  should  be  selected  and  op¬ 
timized,  realized  in  hardware,  and  tested  with  real  neural  data, 
paving  the  way  for  ultimately  implementing  it  on  a  custom  in¬ 
tegrated  circuit  (IC). 

We  have  previously  assessed  the  feasibility  of  hardware  im¬ 
plementation  of  a  subtraction-based  SAR  algorithm  using  the 
well-established  finite  impulse  response  (FIR)  and  infinite  im¬ 
pulse  response  (HR)  temporal  filtering  techniques  for  template 
generation  [21].  Using  MATLAB™  simulations,  both  imple¬ 
mentations  were  shown  to  be  capable  of  removing  stimulus  ar¬ 
tifacts  upon  reaching  steady-state,  with  the  HR  architecture  of¬ 
fering  a  more  favorable  tradeoff  among  performance,  computa¬ 
tional  resources,  and  power  consumption  at  the  expense  of  its 
operation  speed. 

This  paper  presents  our  work  on  hardware  implementation 
of  the  HR  system  proposed  in  [21]  for  a  real-time  SAR  algo¬ 
rithm  based  on  template  subtraction.  The  paper  is  organized  as 
follows.  Section  II  describes  the  SAR  algorithm  and  the  cor¬ 
responding  HR  system  architecture,  and  Section  III  analyzes 
its  dynamic  range  and  fraction  length  requirements  to  deter¬ 
mine  the  number  of  bits  for  the  internal  nodes  of  the  system 
in  fixed-point  computation.  Section  IV  describes  the  implemen¬ 
tation  of  the  HR  SAR  algorithm  on  a  field-programmable  gate 
array  (FPGA),  and  Section  V  presents  the  measured  FPGA  re¬ 
sults  using  two  prerecorded  neural  datasets.  Finally,  Section  VI 
draws  some  conclusions  from  this  work. 
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Fig.  1.  System  architecture  for  the  HR  implementation  of  the  template 
subtraction-based  SAR  algorithm.  The  number  of  bits  in  internal  operation  of 
the  algorithm  is  also  shown. 

operation  of  the  SAR  algorithm,  as  long  as  it  is  predictable  via 
a  stimulus  timing  signal. 

An  FIR  implementation  of  (1)  was  previously  shown  to  re¬ 
quire  at  least  N  —  1  memory  rows  and  N  summations  in  each 
period  of  the  sampling  clock,  whereas  the  HR  implementation 
would  require  a  single  memory  row  and  only  three  summations 
at  the  expense  of  much  longer  system  response  time  [21].  Ini¬ 
tializing  the  memory  with  the  first  recorded  artifact  can  signif¬ 
icantly  decrease  the  HR  system  response  time  for  creating  an 
accurate  artifact  template  signal  [22].  Therefore,  this  paper  fo¬ 
cuses  on  the  HR  implementation  of  the  SAR  algorithm  with 
memory  initialization. 

Fig.  1  depicts  the  system  architecture,  comprising 
neural-recording  front-end  circuitry  for  signal  conditioning  and 
a  DSP  unit  for  executing  the  SAR  algorithm.  The  recording 
front-end  provides  ac  amplification,  dc  input  stabilization, 
bandpass  filtering,  and  10b  digitization  of  the  recorded  neural 
signal  with  fully  programmable  gain  and  bandwidth,  similar  to 
what  has  previously  been  shown  in  [3].  The  DSP  unit,  which 
is  the  focus  of  this  paper,  provides  additional  highpass  filtering 
using  an  HR  digital  filter  with  adjustable  bandwidth  to  remove 
any  residual  dc  offsets  or  low-frequency  noise,  and  performs 
real-time  stimulus  artifact  rejection  using  template  subtraction. 
Based  on  Fig.  1: 

yn  =  (1  -  K)  ■  yn _i  +  K  ■  xn  (2) 


II.  SAR  Algorithm 

To  generate  a  template  signal  representative  of  the  stimulus 
artifact,  temporal  filtering  is  employed  in  which  several  properly 
shifted  versions  of  the  input  neural  data  containing  the  stimulus 
artifacts  are  averaged.  This  is  represented  by  [21] 


N-l 

y(t )  =  a(n)  ■  x(t  -  nTsti)  (1) 

n= 0 

where  y(t)  is  the  estimated  template  signal,  x(t)  is  the  input 
neural  data  containing  the  stimulus  artifacts,  N  is  the  number  of 
stimulus  artifact  waveforms  used  for  template  estimation,  a{n) 
are  averaging  factors  that  should  sum  up  to  unity  for  the  stimulus 
artifact  and  y(t)  to  have  the  same  amplitude  (e.g.,  a(n)  factors 
can  be  all  equal  to  1  /N  for  standard  averaging),  and  Tsti  is 
the  stimulation  period.  It  should  be  noted  that  the  stimulation 
occurrence  does  not  necessarily  have  to  be  periodic  for  correct 


where  yn  is  the  new  artifact  template  signal,  yn-i  is  the  pre¬ 
vious  template  signal,  and  xn  is  the  input  neural  data.  Therefore, 
in  the  HR  implementation,  the  stimulus  artifact  template  signal 
is  retained  in  the  memory,  and  a  new  template  signal  is  gener¬ 
ated  from  the  previous  template  signal  and  the  input  neural  data 
according  to  (2),  which  is  then  subtracted  from  the  input  neural 
data.  The  factor  K  (<1)  plays  a  similar  role  to  N  in  (1),  af¬ 
fecting  the  HR  system  response  time  and  accuracy.  As  shown 
in  the  Appendix,  it  can  be  derived  from  (2)  that  the  minimum 
number  of  stimulus  artifacts,  m,  required  to  generate  an  accu¬ 
rate  template  signal  with  error  less  than,  e.g.,  0.1%  is 


3  —  log  10  |1  —  L) 
logio(l  - 


(3) 


where  Yq  is  the  initial  condition  of  the  memory  normalized  to 
the  steady-state  artifact  template  signal.  Fig.  2  shows  a  plot  of  m 
versus  Yq  for  four  different  values  of  K.  Clearly,  the  closer  the 
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Normalized  Initial  Condition  of  Memory,  YQ 

Fig.  2.  Minimum  number  of  stimulus  artifacts  required  to  generate  an  accurate 
template  signal  with  error  <0.1%  as  a  function  of  the  normalized  initial 
condition  of  the  memory  (Y0  <  1)  for  four  different  values  of  K . 
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Fig.  3.  Ll-norm  estimates  at  nodes  #1-4  for  the  two  selected  values  of  K . 


initial  condition  is  to  the  steady-state  template  signal,  the  faster 
the  system  response  time,  showing  that  the  HR  implementation 
is  particularly  effective  when  stimulus  artifacts  in  consecutive 
stimulation  cycles  are  reproducible.  In  this  work,  the  factor  K 
is  selected  to  be  either  1/16  or  1/32,  which  also  allows  imple¬ 
menting  the  multiplication-by-  K  function  via  a  shift  to  the  right 
by  4b  or  5b,  respectively,  obviating  the  need  for  digital  multi¬ 
pliers. 

It  is  worth  noting  that  the  artifact  template  generation  tech¬ 
nique  in  (2)  performed  by  the  proposed  HR  system  is  in  essence 
an  exponentially  weighted  moving  average  (EWMA)  [23], 
a  statistic  tool  with  a  rich  history  in  process  monitoring  and 
quality  control  charting  [24],  [25]  as  well  as  economics  [26] 
and  industrial  quality  control  [27].  In  this  paper,  we  utilize  a 
real-time  implementation  of  the  EWMA  for  a  novel  application 
in  neural  signal  processing.  Section  III  discusses  the  perfor¬ 
mance  of  the  HR  SAR  algorithm  with  fixed-point  computation 
and  provides  a  framework  for  determining  the  optimum  number 
of  bits  in  internal  operation  of  the  algorithm. 


III.  SAR  Algorithm  With  Fixed-Point  Computation 

When  template  calculations  are  performed  with  floating¬ 
point  precision,  similar  to  when  the  SAR  algorithm  is  executed 
offline  in  MATLAB™  on  a  home-base  computer  post-data 
acquisition,  the  output  can  be  very  accurate.  However,  for 
real-time  execution  of  the  algorithm  in  hardware,  fixed-point 
computation  is  preferred  for  simplicity,  which  then  raises  con¬ 
cerns  about  the  template  signal  accuracy  due  to  quantization 
noise.  In  this  section,  we  find  the  optimum  number  of  bits 
in  internal  operation  of  the  SAR  algorithm  by  analyzing  the 
dynamic  range  and  fraction  length  requirements. 

In  HR  systems,  the  internal  nodes  of  the  structure  can  po¬ 
tentially  overflow,  necessitating  an  adjustment  in  their  dynamic 
range  to  satisfy  the  Ll-norm  criteria  for  preventing  an  overflow 
[28]-[30].  In  Fig.  1,  consider  the  signal  path  from  the  input 
neural  data  (i.e.,  xn)  to  each  of  the  four  internal  nodes  of  the 


algorithm  (i.e.,  nodes  #1-4).  Assume  the  resulting  transfer  func¬ 
tions  and  corresponding  impulse  responses  are  Fi(z)  and  fi[n\, 
respectively.  Modeling  the  memory  block  as  a  unit  delay,  it  can 
be  shown  that 


Fi(z)  =  K 


F2(z)  = 


F3(z)  = 


F4(z)  = 


K 

Kz -1 

1  -  (1  -  iTlz-1 
K{  1  -  K)z~x 
1  -  (1  -  JQz"1  ‘ 


(4) 


Fig.  3  depicts  the  Ll-norm  estimates  of  the  four  transfer  func¬ 
tions  for  the  two  selected  values  of  K,  where  Ll-norm  is 


ll/lli  =  £l/MI- 

n= 0 


(5) 


As  can  be  seen  in  all  cases,  the  Ll-norm  estimates  are  less 
than  one,  indicating  that  no  additional  bits  (equal  to  log2  ||/||i) 
are  needed  beyond  10b  for  the  internal  nodes  to  avoid  overflow. 
The  SAR  algorithm  output  node  ( zn  =  xn  —  yn)  has  higher 
dynamic  range  of  lib  to  prevent  the  saturation  of  the  output 
after  subtraction,  in  case  of  an  overflow/underflow. 

Next,  to  assess  the  impact  of  quantization  noise  induced  by 
fixed-point  computation  on  template  signal  accuracy,  we  deter¬ 
mine  the  signal-to-noise  ratio  (SNR)  in  template  signal  gener¬ 
ation  as  a  function  of  the  fraction  length  for  the  internal  nodes 
(i.e.,  number  of  additional  bits  beyond  10b  in  a  word-length). 
Fig.  4  shows  the  simulation  structure  for  comparing  the  per¬ 
formance  of  the  SAR  algorithm  with  fixed-point  computation 
versus  that  with  floating-point  computation  by  determining  the 
SNR  [31].  Q i  and  Q2  are  two  quantizers  that  quantize  their  in¬ 
puts  to  the  word-length  value,  whereas  Q3  quantizes  its  input  to 
10b.  Fig.  5  depicts  the  simulated  SNR  and  effective  number  of 
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Fig.  4.  Simulation  structure  for  determining  the  SNR  in  template  signal 
generation. 


Fig.  5.  Simulated  SNR  and  ENOB  of  template  signal  generation  in  the  HR 
SAR  algorithm  with  fixed-point  computation  versus  the  fraction  length  for  the 
two  selected  values  of  K. 


bits,  ENOB,  in  template  signal  generation  for  the  two  selected 
values  of  K ,  where  the  SNR  is  defined  as 

SNR  =  201og10%^  (6) 

rms 

with  Sout  and  Nq  representing  the  reference  output  and  quan¬ 
tization  noise,  respectively.  Input  xn  is  taken  to  be  a  lOb-digi- 
tized  sinusoidal  signal  with  rail-to-rail  amplitude  (i.e.,  —512  to 
511  in  two’s  complement  format)  and  a  frequency  of  0.1  mHz 
to  capture  the  underlying  assumption  that  the  stimulus  artifacts 
do  not  change  rapidly  with  time.  Assuming  a  stimulation  fre¬ 
quency  of  1  Hz,  rrn’s  sampling  frequency  is  also  1  Hz.  Clearly, 
the  system  requires  a  fraction  length  of  5b  to  achieve  ~10b  ac¬ 
curacy  in  template  signal  generation  with  K  =  1/32.  A  lower 
fraction  length  would  increase  the  quantization  noise  and  de¬ 
grade  the  accuracy  to  <  10b,  whereas  a  higher  fraction  length  not 
only  would  increase  the  requisite  hardware  resources  to  support 
larger  memory  size,  but  also  would  not  offer  any  significant  ben¬ 
efit  given  that  by  design  the  overall  system  performance  would 
be  limited  by  that  of  the  neural-recording  front-end  [3],  and  not 
the  DSP  unit.  Taking  into  account  these  considerations  related 
to  dynamic  range  and  fraction  length  requirements,  the  selected 
number  of  bits  for  the  internal  operation  of  the  SAR  algorithm 
is  shown  in  Fig.  1. 


IEEE  TRANSACTIONS  ON  BIOMEDICAL  CIRCUITS  AND  SYSTEMS 

IV.  FPGA  Implementation 

The  DSP  unit  in  Fig.  1,  comprising  the  digital  highpass  filter 
(HPF)  and  the  SAR  algorithm  circuitry,  has  been  implemented 
on  an  FPGA  using  the  DE2  Development  and  Educational 
Board,  which  has  the  Cyclone  II  device  by  Altera  as  its  FPGA 
platform.  Fig.  6  depicts  the  architecture  of  the  DSP  unit  in 
FPGA  implementation,  which  incorporates  a  68b  parameter 
register,  a  digital  control  unit,  and  a  DSP  core.  The  param¬ 
eter  register  is  used  to  store  the  user- selectable  parameters 
for  system  operation  such  as  the  bandwidth  setting  of  the 
digital  HPF  and  factor  K  in  the  SAR  algorithm,  as  well  as 
memory  initialization,  memory  length,  and  output-blanking 
settings.  The  memory  length  (i.e.,  number  of  16b  samples)  is 
determined  by  the  sampling  clock  frequency  and  the  stimulus 
artifact  duration.  If  needed,  the  blanking  feature  is  used  after 
template  subtraction  to  remove  any  residual  artifacts  in  the 
output  around  the  rising  and  falling  edges  of  the  artifact  where 
it  rapidly  changes  with  time  [21].  The  parameter  register  is 
implemented  as  a  standalone  circuit  block  with  its  own  timing 
and  control  operation,  which  is  separate  from  that  of  the  other 
circuit  blocks  and  applied  externally.  This  is  because  this  block 
is  loaded  with  the  requisite  system  parameters  only  once  prior 
to  the  experiment  and  is  not  synchronously  clocked  with  the 
rest  of  the  circuit  during  SAR  algorithm  operation. 

The  digital  control  unit  incorporates  counters  and  finite-state 
machines  and  provides  timing,  path,  and  blanking  control  sig¬ 
nals  for  the  DSP  core.  The  required  inputs  for  the  digital  control 
unit  include  a  stimulus  timing  signal,  system  clock  and  sampling 
clock  signals,  and  system  parameters  such  as  memory  length, 
memory  initialization,  and  blanking  settings. 

The  DSP  core  incorporates  a  digital  HPF,  circuitry  to  exe¬ 
cute  the  SAR  algorithm,  and  parallel-to- serial  converters  at  the 
output.  The  required  inputs  for  the  DSP  core  include  the  am¬ 
plified/digitized  neural  signal  (10b),  system  clock  signal,  and 
control  signals  provided  by  the  digital  control  unit.  Fig.  6  also 
shows  the  structure  of  the  digital  HPF  and  SAR  algorithm  cir¬ 
cuitry  in  the  DSP  core  as  implemented  on  the  FPGA.  The  ampli¬ 
fied/digitized  input  neural  signal  is  first  highpass  filtered  using 
a  lst-order,  HR  filter  with  direct  form  II  architecture.  Factor  K\ 
is  the  user-selected  HPF  coefficient  that  controls  the  filter  band¬ 
width  and  is  selected  judiciously  to  perform  the  filtering  using 
arithmetic  shifts,  subtraction  and  addition  only,  with  no  need 
for  digital  multipliers  or  dividers  [3].  The  user  can  set  K\  to 
be  either  1/16  or  1/8,  which  results  in  a  filter  cutoff  frequency 
of  366  Hz  or  756  Hz,  respectively,  from  a  1-MHz  system  clock. 
Since  the  digitized  data  at  the  analog-to-digital  converter  (ADC) 
output  are  unsigned  numbers  (10b),  a  factor  of  5 12  is  subtracted 
from  the  input  signal  to  convert  it  to  two’s  complement  format 
for  further  processing.  In  addition,  an  overflow/underflow  de¬ 
tector  is  used  at  the  HPF  output  to  limit  its  dynamic  range  to 
10b  before  feeding  it  to  the  SAR  algorithm  circuitry. 

The  SAR  algorithm  only  operates  for  the  duration  of  each 
stimulus  artifact.  The  digitized/filtered  sample  at  the  output  of 
the  HPF  filter  (10b)  is  first  converted  to  15b  via  a  shift  to  the 
left  by  5b  and  then  multiplied  by  factor  K 2  (same  as  K  in 
Fig.  1)  stored  in  the  parameter  register.  Next,  the  memory  data 
containing  the  previous  template  signal  are  read,  multiplied  by 
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Fig.  6.  Architecture  of  the  DSP  unit  (top)  and  structure  of  the  digital  HPF  and  SAR  algorithm  circuitry  in  the  DSP  core  (bottom)  as  implemented  on  the  FPGA. 


(1  —  K2),  and  added  to  ( K 2  •  xn)  to  obtain  the  new  template 
signal  (15b),  which  is  written  back  into  the  memory  for  the 
next  cycle.  The  new  template  signal  is  also  converted  back  to 
10b  and  subsequently  subtracted  from  the  10b  digitized/filtered 
input  sample  to  produce  the  SAR  algorithm  output  signal.  Out¬ 
side  the  duration  of  the  stimulus  artifact,  the  SAR  algorithm  cir¬ 
cuitry  is  disabled  and  the  digitized/filtered  sample  at  the  HPF 
output  is  directly  passed  to  the  output  register. 

The  path  control  signal  from  the  digital  control  unit  manages 
the  memory  initialization.  Specifically,  if  the  recorded  stimulus 
artifact  is  the  first  artifact,  indicated  as  such  by  the  stimulus 
timing  signal,  the  path  control  signal  routes  the  15b  sample  di¬ 
rectly  to  the  memory  input  for  its  initialization.  With  the  next 
indication  of  stimulation  by  the  stimulus  timing  signal,  the  HR 
system  executes  the  SAR  algorithm  as  previously  described.  If 
the  memory  initialization  setting  is  not  enabled  by  the  user,  the 
memory  can  be  cleared  to  start  with  zero  internal  values,  but 
this  would  increase  the  HR  system  response  time  as  previously 
shown  in  Fig.  2. 

The  16b,  4K  memory  is  implemented  using  the  internal 
SRAM  of  the  FPGA.  Even  parity  is  used  to  check  for  memory 
error,  which  is  generated  by  an  XOR  function  of  all  the  bits  in 
each  15b  sample.  The  parity  bit  is  then  added  to  the  end  of  the 
data  bits  before  being  written  into  the  memory  as  a  16b  sample. 
When  the  memory  data  are  read  out,  a  parity  checker  checks 
for  memory  error,  and  this  information  is  sent  to  the  output. 
The  15b  sample  is  also  sent  to  the  rest  of  the  SAR  algorithm 
circuitry  for  template  generation.  Including  the  memory  parity 
check  feature,  while  not  entirely  necessary  for  an  FPGA-based 
system,  would  streamline  the  design  translation  from  an  FPGA 
to  an  IC  platform  in  the  future. 

The  blanking  control  signal,  which  is  also  received  from  the 
digital  control  unit,  is  used  to  remove  any  residual  artifacts  in 
the  output  after  template  subtraction.  Specifically,  this  control 
signal  activates  a  multiplexer  that  replaces  the  output  data  with 
“0”  for  the  time  period  in  which  blanking  is  applied,  which  is 
normally  at  the  rising  and  falling  edges  of  the  artifact  where  it 


rapidly  changes  with  time.  The  user  can  independently  set  the 
blanking  duration  around  the  rising  and  falling  edges  from  0 
(i.e.,  no  blanking)  to  2,047  data  points. 

The  three  registers  in  Fig.  6  are  used  for  pipelining  in  order 
to  overlap  the  processing  in  each  stage  and  prevent  harmful  race 
conditions  with  proper  timing  control.  Further,  since  the  SAR 
algorithm  circuitry  operates  synchronously  with  a  system  clock, 
all  circuit  blocks  (except  the  parameter  register)  share  the  same 
system  clock  signal  globally  and  use  a  local  Enable  signal  for 
synchronization  [32]. 

V.  FPGA  Measurement  Results 

The  DSP  unit  as  depicted  in  Fig.  6  has  been  synthesized  and 
mapped  to  the  Cyclone  II  FPGA,  EP2C35F672C6,  using  Al¬ 
tera'  s  Quartus  II  design  software.  The  mapped  circuitry  con¬ 
sumed  2%  (656)  of  the  total  available  logic  elements  (LEs)  and 
14%  (65,536)  of  the  total  available  memory  bits.  The  DE2  board 
was  programmed  and  connected  to  a  digital  data  acquisition 
(DAQ)  card,  NI  6541,  which  provided  the  input  signal  to  the 
FPGA  and  recorded  the  output  waveforms.  The  system  clock 
was  applied  to  the  FPGA  using  the  onboard  external  clock  port, 
and  a  supply  of  9  V  was  used  to  power  up  the  board  with  its 
input-output  (I/O)  ports  at  3.3  V.  For  all  FPGA  measurements 
described  below,  factors  K\  and  K2  (see  Fig.  6)  were  both  set 
to  1/16. 

Two  sets  of  prerecorded  neural  data  from  two  different  labo¬ 
ratories  were  used  to  experimentally  verify  the  operation  of  the 
HR  SAR  algorithm  and  its  FPGA  implementation.  Specifically, 
a  294- s  window  of  prerecorded  neural  data  from  a  rat  was  used 
as  the  first  dataset.  The  rat  data  were  sampled  at  ~24.4  kHz 
and  obtained  during  4-Hz  cortical  stimulation.  A  gain  of  520 
(~54.3  dB)  was  applied  to  the  neural  data  before  feeding  it  to 
the  FPGA.  The  SAR  algorithm  was  set  to  operate  for  5  ms  upon 
receiving  an  indication  of  stimulation  by  the  stimulus  timing 
signal,  and  no  output  blanking  was  applied. 

A  125-s  window  of  prerecorded  data  from  an  Aplysia  calif or- 
nica  (a  marine  sea  slug)  was  used  as  the  second  neural  dataset. 
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Fig.  7.  FPGA  measurement  results  using  prerecorded  neural  data  from  a  laboratory  rat.  (a)  Top  plot  shows  a  294-s  window  of  the  input  data  to  the  FPGA.  Middle 
plot  depicts  the  generated  stimulus  artifact  template  signal,  whereas  the  bottom  plot  shows  the  HR  system  output  from  the  FPGA.  Two  5-ms  snapshots  of  the 
waveforms  are  shown  at  (b)  t  208  s  and  (c)  t  256  s. 


The  Aplysia  data  were  sampled  at  2  kHz  and  obtained  during 
0.5-Hz  stimulation.  A  gain  of  1,000  (60  dB)  was  applied  to  the 
neural  data  before  feeding  it  to  the  FPGA.  Upon  receiving  an  in¬ 
dication  of  stimulation  by  the  stimulus  timing  signal,  the  SAR 
algorithm  was  set  to  operate  for  96  ms  (the  duration  of  stim¬ 
ulus  artifact  in  the  Aplysia  dataset  was  much  longer  than  that  in 
the  rat  dataset),  and  output  blanking  was  set  to  occur  for  4  ms 
synchronized  with  the  rising  and  falling  edges  of  the  stimulus 
timing  signal.  The  applied  gain  values  represented  those  pre¬ 
viously  obtained  with  our  neural-recording  front-end  operating 
from  1.5  V  [3].  The  gain  values  were  high  enough  to  achieve 
sufficient  resolution  at  the  DSP  unit  input,  while  keeping  the 
amplitude  of  the  amplified  neural  data  below  1.5  Vpp. 

Fig.  7  shows  the  FPGA  measurement  results  using  the  rat 
neural  dataset.  The  top  plot  in  (a)  depicts  the  input  neural  data 
to  the  FPGA,  consisting  of  neural  spikes  buried  in  large  stim¬ 
ulus  artifacts.  The  middle  plot  shows  the  generated  artifact  tem¬ 
plate  signal  after  memory  initialization  as  previously  described. 
Note  the  fast  response  time  of  the  HR  SAR  algorithm  in  quickly 
generating  the  template  signal  even  for  the  initial  stimulus  ar¬ 
tifacts,  as  well  as  how  fast  the  generated  template  signal  tracks 
the  variation  in  stimulus  artifact  amplitude  in  the  first  100  sec¬ 
onds.  The  bottom  plot  depicts  the  HR  system  output  from  the 
FPGA  in  which  the  large  stimulus  artifacts  are  rejected  and  the 
neural  data  recovered  in  real  time. 


Fig.  7(b)  and  (c)  depict  5-ms  snapshots  of  the  waveforms  at 
t  =  ~208  s  and  ~256  s,  respectively,  demonstrating  that  the 
system  is  fully  capable  of  recovering  neural  action  potentials 
that  occur  on  the  tail  end  of  the  artifact  [see  Fig.  7(c)]  or  appear 
as  close  as  within  0.5  ms  after  the  artifact  spike  [see  Fig.  7(b)]. 

The  slight  discrepancy  between  the  amplitude  of  the  input 
artifact  and  that  of  the  template  signal  is  because  the  template 
signal  actually  represents  the  highpass  filtered  artifact. 

Fig.  8  shows  a  5-s  snapshot  of  the  waveforms  in  Fig.  7(a) 
around  the  onset  of  stimulation  and  their  corresponding 
spectrograms  obtained  using  1,024-sample  windows  with 
1,000-sample  overlap.  As  can  be  seen  in  the  top  and  middle 
spectrograms,  the  artifacts  in  the  rat  neural  dataset  have  strong 
frequency  components  below  5  kHz  that  are  significantly  re¬ 
duced  in  the  output  (see  the  bottom  spectrogram),  allowing  the 
weaker  neural  activity  to  emerge  from  the  large  artifacts.  For 
the  very  first  stimulus  artifact  at  just  prior  to  t  =  2.5  s,  which 
is  the  one  loaded  into  the  memory  for  its  initialization,  the 
corresponding  template  signal  would  be  1/1 6th  of  the  artifact 
according  to  (2),  and  therefore  15/ 16th  of  the  artifact  appears  in 
the  output  data  after  subtraction.  The  HR  SAR  algorithm  then 
removes  all  the  subsequent  stimulus  artifacts  starting  with  the 
second  one.  If  present,  artifact  residuals  as  seen  in  Figs.  7(b) 
and  (c)  in  the  time  domain  and  Fig.  8  in  the  frequency  domain 
(bottom  spectrogram)  are  now  insignificant  as  compared  to  the 
neural  action  potentials. 
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Fig.  10.  Top  plot  shows  a  96-ms  portion  of  the  Aplysia  neural  dataset,  showing 
a  total  of  61  unfiltered  stimulus  artifacts  superimposed  on  each  other  with  some 
action  potentials  riding  on  the  tail  end  of  the  artifacts.  Middle  plot  depicts 
the  61  stimulus  artifact  templates  superimposed  on  each  other,  which  actually 
represent  the  highpass  filtered  artifacts  (not  shown).  Bottom  plot  shows  the 
artifact-free  FPGA  output  in  which  the  neural  spikes  are  recovered  after 
template  subtraction.  Residual  artifacts  are  also  simultaneously  removed  after 
4-ms  blanking  (arrows).  Note  the  smaller  dynamic  range  of  the  Y-axis  in  the 
Low  bottom  plot  after  artifact  removal  and  residual  blanking. 


Fig.  8.  A  5-s  snapshot  of  the  FPGA  measurement  results  using  the  prerecorded 
rat  neural  dataset  and  their  corresponding  spectrograms.  The  5-s  snapshot  is 
taken  around  the  stimulus  onset. 


FPGA  Input 


FPGA  Output 


Fig.  9.  FPGA  measurement  results  using  the  prerecorded  Aplysia  neural 
dataset  and  their  corresponding  spectrograms. 


Fig.  9  shows  the  FPGA  measurement  results  using  the 
Aplysia  neural  dataset  and  their  corresponding  spectrograms. 
The  top  plot  depicts  the  input  neural  data  to  the  FPGA,  con¬ 
taining  many  large  stimulus  artifacts  that  occur  at  0.5  Hz  and 
bursts  of  extracellular  neural  activity  that  occur  in  between 
and  occasionally  on  the  tail  end  of  the  artifacts.  The  middle 
plot  shows  the  generated  artifact  template  signal  and  its  spec¬ 
trogram,  indicating  that  the  artifacts  have  their  frequency 
components  spread  throughout  a  1-kHz  bandwidth  with  strong 
frequency  components  contained  below  200  Hz.  The  bottom 
plot  shows  the  FPGA  output  data  after  blanking  in  which  all 


stimulus  artifacts  (minus  the  first  one  as  explained  previously) 
are  successfully  removed  from  the  recorded  data  in  real  time  to 
recover  the  neural  activity. 

Fig.  10  shows  a  close-up  view  of  the  waveforms  during  the 
96-ms  period  of  operation  for  the  SAR  algorithm.  The  top 
plot  depicts  61  unfiltered  stimulus  artifacts  superimposed  on 
each  other  (i.e.,  all  the  artifacts  present  in  the  125-s  window 
of  Aplysia  neural  dataset  minus  the  very  first  one),  with  some 
action  potentials  also  occurring  on  the  tail  end  of  the  artifacts. 
The  middle  plot  shows  the  corresponding  artifact  templates 
superimposed  on  each  other,  whereas  the  bottom  plot  depicts 
the  artifact-free  HR  system  output  from  the  FPGA  after  tem¬ 
plate  subtraction  and  4-ms  blanking  (arrows)  for  simultaneous 
removal  of  the  artifacts  and  artifact  residuals,  respectively, 
demonstrating  successful  operation  of  the  algorithm  and  its 
hardware  implementation. 

In  order  to  assess  the  performance  of  the  HR  SAR  algorithm 
and  its  hardware  implementation  in  a  quantitative  manner,  a 
total  of  908  stimulus  artifacts  (54  of  62  and  854  of  1,000  ar¬ 
tifacts  in  the  Aplysia  and  rat  neural  datasets,  respectively)  were 
analyzed.  Specifically,  the  mean  and  standard  deviation  of  the 
root-mean- square  (rms)  values  of  the  artifacts  were  computed 
pre-  and  post-processing  by  the  FPGA. 

The  analysis  excluded  the  very  first  artifact  in  each  neural 
dataset  and  those  artifacts  that  had  action  potentials  present  any¬ 
where  in  their  duration  over  which  the  algorithm  was  operating 
(96  ms  and  5  ms  for  the  Aplysia  and  rat  artifacts,  respectively). 
This  ensured  that  the  occasional  presence  of  action  potentials 
did  not  confound  the  analysis.  The  same  statistics  were  also  ob¬ 
tained  from  segments  of  the  FPGA  output  that  represented  pure 
noise  (i.e.,  absence  of  both  action  potentials  and  artifact  resid¬ 
uals).  Table  I  tabulates  the  results  of  this  analysis.  In  the  case  of 
Aplysia  neural  dataset  that  contains  relatively  stationary  stim¬ 
ulus  artifacts  (see  the  top  plot  in  Fig.  9  and  note  the  small  stan¬ 
dard  deviation  value  in  Table  I),  the  rms  value  of  the  artifact 
on  average  is  reduced  by  a  factor  of  17,  resulting  in  post-pro- 
cessed  rms  values  that  are  at  the  level  of  that  for  the  output 
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TABLE  I 

Statistics  of  Pre-  and  Post-Processed  Stimulus  Artifacts 


Aplysia  calif ornica  (54  of  62  SAs) 


Mean  (nVms) 

SD  (nV^) 

Pre-Processing 

68.33 

1.21 

Post-Processing 

4.01 

0.68 

Output  Noise 

3.83 

0.16 

Rat  (854  of  1,000  SAs) 


Pre-Processing 

115.74 

21.72 

Post-Processing 

21.65 

16.70 

Output  Noise 

5.03 

0.46 

Fig.  11.  Root-mean-square  (rms)  value  of  the  stimulus  artifacts  (854  of  1,000) 
in  the  rat  neural  dataset  pre-  and  post-processing  by  the  FPGA.  The  dashed  line 
represents  an  average  rms  value  of  5.03  ^  V  for  the  output  noise  obtained  from 
10  different  5-ms  segments  that  did  not  contain  any  action  potentials  or  artifact 
residuals. 


noise.  In  the  case  of  rat  neural  dataset  that  contains  both  sta¬ 
tionary  and  non- stationary  artifacts  (see  the  top  plot  in  Fig.  7(a) 
and  note  the  larger  standard  deviation  value  in  Table  I),  the  re¬ 
duction  in  the  rms  value  on  average  is  more  modest  (a  factor  of 
5.3).  A  closer  look  at  the  rms  values  of  individual  stimulus  ar¬ 
tifacts  pre-  and  post-processing  reveals  that  the  degradation  of 
performance  is  limited  to  when  there  is  a  sudden  change  in  the 
artifacts  (see  Fig.  1 1  and  compare  its  trend  with  how  the  arti¬ 
facts  are  changing  in  the  top  plot  of  Fig.  7(a)),  whereas  the  rms 
values  of  the  post-processed  artifacts  indeed  approach  that  of 
the  output  noise  when  the  artifacts  are  relatively  stationary. 

VI.  Conclusion 

This  paper  reported  on  a  neural  signal-processing  algorithm 
for  real-time  stimulus  artifact  rejection  (SAR)  in  which  a  high- 
fidelity  template  signal  representative  of  the  stimulus  artifacts 
was  first  generated  via  temporal  filtering  and  subsequently  sub¬ 
tracted  from  the  contaminated  neural  data  to  remove  the  arti¬ 
facts.  A  system  architecture  for  the  HR  implementation  of  the 
algorithm  was  realized  in  hardware  on  an  FPGA  platform,  fea¬ 
turing  memory  initialization  as  a  simple  method  to  significantly 
decrease  the  HR  system  response  time  for  accurate  template 
generation.  The  measured  FPGA  results  using  two  sets  of  prere¬ 
corded  neural  data  from  a  rat  and  an  Aplysia  calif  ornica  verified 
the  functionality  of  the  algorithm  and  its  hardware  implementa¬ 
tion  by  removing  the  stimulus  artifacts  in  real  time  from  the  con¬ 
taminated  recorded  data  and  recovering  the  extracellular  neural 
activity. 

The  major  advantage  of  this  approach  as  compared  to  the 
blanking  techniques  (i.e.,  disconnecting  the  recording  amplifier 


input  during  stimulation)  is  that  it  has  the  potential  to  retain 
signal  information  during  stimulation  while  fully  eliminating 
the  artifacts  from  the  contaminated  data  record  in  real  time.  On 
the  other  hand,  one  limitation  of  this  approach  is  that  it  does  not 
directly  address  the  problem  of  amplifier  saturation  and  hence 
becomes  less  effective  with  prolonged  amplifier  saturation,  un¬ 
less  care  is  taken  in  the  design  of  the  recording  and  stimulating 
circuitry  to  prevent  (or  at  least  minimize)  amplifier  saturation  by 
decreasing  the  duration  and  amplitude  of  the  artifacts  [18]— [20]. 
Another  limitation  of  this  approach  is  that  if  neural  activity  oc¬ 
curs  on  the  tail  end  of  the  artifact  and  is  time-locked  to  stim¬ 
ulation,  it  will  be  removed  by  the  system  along  with  the  arti¬ 
facts.  Similarly,  if  neural  activity  occurs  during  the  rising/falling 
edges  of  the  artifact  spike,  it  will  be  lost,  because  it  will  be  either 
blanked  out  by  the  system  or  heavily  distorted  by  the  residuals 
with  no  blanking. 

This  technique  can  potentially  handle  other  stimulation  sce¬ 
narios  as  well,  given  that  it  only  needs  the  stimulus  timing  signal 
information  for  correct  operation.  For  example,  if  stimulation 
occurs  simultaneously  on  two  electrodes,  a  combined  stimulus 
artifact  might  appear  on  the  recording  electrode  that  can  be  re¬ 
moved  even  by  the  current  system.  If  stimulation  occurs  alter¬ 
nately  on  two  electrodes,  two  different  stimulus  artifact  types 
might  appear  alternately  as  well  on  the  recording  electrode  and 
can  be  removed  by  modifying  the  timing  operation  of  the  system 
to  handle  each  artifact  type  independently,  if  there  is  no  tem¬ 
poral  overlap  between  the  artifacts.  Ultimately,  a  tradeoff  exists 
between  functional  versatility  and  system  operation  complexity. 

Finally,  given  the  relatively  low  system  clock  frequency  of  <  1 
MHz  in  this  work  and  that  the  synthesized  algorithm  utilized  a 
very  small  percentage  of  the  available  FPGA  resources,  it  was 
not  readily  feasible  to  accurately  determine  the  power  consump¬ 
tion  in  hardware  implementation.  Efforts  are  currently  under 
way  for  custom  implementation  of  the  DSP  unit  in  Fig.  1  on 
an  IC  that  would  also  incorporate  recording  front-end  and  stim¬ 
ulating  back-end  circuitry  adapted  from  [3]  to  form  a  complete 
system.  To  that  end,  our  preliminary  work  shows  that  the  DSP 
unit  can  be  implemented  with  a  total  area  of  3.64  mm2  (89%  oc¬ 
cupied  by  the  16b,  4K  SRAM)  in  0.35-/xm  CMOS  technology 
with  power  consumption  on  the  order  of  low-tens  of  microwatts 
from  1.5  V  (1-MHz  system  clock),  indicating  the  feasibility  of 
running  the  algorithm  on  a  miniaturized,  integrated  device  in  the 
near  future. 

Appendix 

In  this  Appendix,  we  show  the  derivation  of  (3)  in  Section  II: 
SAR  Algorithm.  As  previously  stated,  based  on  Fig.  1 

yn  =  (1-  K)-  yn_ i  +  K  ■  xn  (Al) 

where  n  =  1,2, 3,....  Hence,  it  is  simple  to  see  that 

Vi  =  (1  -  K)  •  y0  +  K  •  xx 
U2  =  (1  -  K)  ■  yi  +  K  ■  x2 

=  (1  —  K •  yo  +  K  •  (1  —  K )  •  x\  +  K  •  X2  (A2) 
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which  means  that  the  template  signal  for  the  rath  artifact  can  be  University,  Cleveland,  OH,  USA,  for  helpful  discussions  that 
written  as  made  this  work  possible. 


Vm  =  (1  -  K)m  -yo  +  K  -[Xm 

+(l  -  K)  •  +•••  +  (  1  —  K)m~l  •  :r  \  ]  (A3) 
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where  yo  is  the  initial  condition  of  the  memory.  Assume  that 
x\  yX2, . . .  xm  are  all  equal  to  the  steady-state  artifact  template 
signal,  yss.  Therefore 


—  =  (1-K)m-Y0+K-[1+(1-K)+-  ■  ■+(l-K)m-1]  (A4) 

Vss 

where  To  =  (yo)/(yss)  is  the  initial  condition  of  the  memory 
normalized  to  the  steady- state  artifact  template  signal.  Given  the 
sum  of  geometric  series,  it  can  be  shown  that 


i  +  (i  -  k)  +  ■  ■  ■  +  (i  -  jrp-1 

_  1  -  (1  -  K)m 
~  1  -  (1  -  AT) 

_  1  -  (1  -  K)m 
~  K 

which  means  that  (A4)  can  be  simplified  to 


(A5) 


y^  =  h-K)m-Y0  +  l-(l-K)m.  (A6) 

Vss 

If  To  <  1,  for  generating  an  accurate  template  signal  with  error 
less  than,  e.g.,  0.1%,  one  needs  to  have  (ym,) / (y8s)  >  0.999, 
which  means  (1  —  Af)m.(  1  —  To)  <  0.001  from  (A6).  Taking  a 
logarithm  of  both  sides  and  noting  that  log10(l  —  K)  <  0,  one 
can  obtain 


ra  > 


-3  -  log10(l  -  Y0) 
logio(l-*0 


(A7) 


If  To  >  1,  for  generating  an  accurate  template  signal  with  error 
less  than  0.1%,  one  needs  to  have  ( ym)/(ySs )  <  1.001,  which 
ultimately  leads  to 


-3  -  logioQo  -  1) 
log10(l  -  K )  ' 


(A8) 
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